Applications in Plant Sciences
○ Wiley
Preprints posted in the last 90 days, ranked by how well they match Applications in Plant Sciences's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.
Liu, S.; Zhang, W.; Yu, P.
Show abstract
Pangenome-level gene family identification often applies sequence similarity clustering without phylogenetic or synteny information, which risks biologically misleading evolutionary inferences. Using five transcription factor families (bHLH, MYB, NAC, WRKY, MADS-box) across 401 rice pangenome accessions, we compared clustering strategies: OrthoFinder alone, cd-hit alone, MMseqs2 alone, and OrthoFinder-informed refinement by cd-hit or MMseqs2. Methods solely based on sequence similarity merged distinct orthogroups and generated fewer orthogroups than approaches incorporating graph-based orthology. Conflicting cluster assignments, measured against OrthoFinder, varied strongly among families, from approximately 14% in MADS-box to approximately 57% in MYB, and were associated with protein length differences. Core, shell, and cloud gene classifications shifted substantially depending on the method, especially in MYB, NAC, and WRKY families. Critically, Ka/Ks distributions for core genes were highly method-sensitive, with orthology-aware methods yielding more convergent and less variable estimates of selective pressure, whereas noncore gene estimates remained robust. These findings demonstrate that neglecting graph-based orthogroup inference inflates methodological artifacts. We recommend a two-step strategy: initial graph-based orthogroup delineation followed by sequence similarity refinement to balance evolutionary accuracy and resolution in pangenome-scale gene family studies.
Ayub, Y.; McGuire-Scullen, S.; Percival, S.; Weaver, W. N.; Karki, N.; Yahiaoui, W.; Astudillo-Pavon, K.; Barrios, A.; Check, J. C.; Colchado-Lopez, J.; Dolgikh, B. A.; Espinosa-Martinez, D. V.; Fu, Q.; Galvan-Lara, K. M.; Garcia-Chavez, J. N.; Garcia-Rios, S.; Grabb, C. N.; Guadir-Lara, G. E.; Hawkins, J. C.; Hendrickson, C. L.; Hightower, A. T.; Hurtado-Olvera, J. J.; Kianian, S.; Lennon, J.; Li, Z.; Li, J.; Lieb, B.; Lin, J.; Lopez-Sanchez, P.; Luna-Alvarez, M.; Martinez-Martinez, C.; Montemayor-Lara, a.; Moreno, N. A.; Obisesan, I. A.; Perez-Flores, O.; Pimentel-Ruiz, C.; Pineda-Hernandez,
Show abstract
(1) RationaleQuantifying and predicting plant morphology is central to understanding development and evolution, yet many plant forms lack homologous features required for traditional morphometrics. We apply the Euler Characteristic Transform (ECT), an injective descriptor from topological data analysis, to encode 2D plant shapes. The ECT converts contours into image-like representations that preserve shape information while enabling deep learning. (2) MethodsWe computed ECTs for large datasets of leaf and pavement cell shapes and used convolutional neural networks (CNNs) for classification. We also trained CNNs to approximate the inverse mapping, predicting leaf shape masks from radial ECTs. (3) Key resultsECT-based models achieved high classification accuracy, surpassing previous approaches on millions of herbarium-derived leaves. Notably, grapevine leaf venation was predicted from blade geometry alone, demonstrating that vascular structure is encoded in the outline. (4) Main conclusionThe ECT provides a compact, information-preserving representation of biological shape that integrates naturally with deep learning. It enables both accurate classification and predictive reconstruction, revealing latent morphological information and offering new opportunities to study plant form across scales.
KUDDAR, O. S.; Meiklejohn, K. A.; Callahan, B. J.
Show abstract
Plant DNA metabarcoding enables the identification of plant taxa in mixed samples, with the trnL (UAA) intron and its P6 loop mini-barcode region performing as well as or better than other commonly used markers. Reliable metabarcoding requires high-quality reference databases, yet a regularly maintained trnL resource is currently lacking. Consequently, most studies use uncurated sequences downloaded directly from public repositories without essential validation. We address these gaps by providing guidance through a systematic comparison of three database curation tools - OBITools3/ecoPCR, RESCRIPt, and MetaCurator - to generate three trnL reference sequence databases and evaluate their classification performance across commonly sequenced trnL regions (CD, CH, and GH). Reference trnL sequences and taxonomy files were retrieved from public sequence repositories and curated using standardized filtering steps to reduce taxonomic errors, sequence ambiguity, and redundancy. Four simulated query datasets--two base sets and their mutated counterparts--were constructed to assess classification performance of the databases using the Naive Bayesian Classifier implemented in DADA2.- The evaluation showed that performance differed by trnL region: MetaCurator and RESCRIPt yielded higher and similar metrics for trnL CD; OBITools3/ecoPCR and RESCRIPt were comparable for trnL CH; and MetaCurator attained the highest performance for trnL GH region. All reference databases, taxonomy, and evaluation files are available at Zenodo (https://doi.org/10.5281/zenodo.17969450). The complete computational workflow and scripts are available on GitHub (https://github.com/oskuddar/trnL_DB). Although evaluation was focused on plant taxa in the United States, the resulting databases are suitable for use as global trnL reference databases.
Kilsztajn, Y.; Conceicao, L. H. S. d. M.; Proenca, C. E. B.; Vasconcelos, T. N. d. C.; Staggemeier, V. G.
Show abstract
PremiseHerbarium specimens are increasingly used to extract morphological traits for ecological and evolutionary studies, yet the effects of tissue desiccation on trait measurements remain poorly understood. Here, we tested whether higher tissue water content leads to greater measurement changes after herborization (H1) and whether fresh trait values can be reliably predicted from herbarium measurements (H2). MethodsWe evaluated the reliability of herbarium-based measurements by comparing fresh and dried traits of leaves, flowers, fleshy fruits, and seeds across 262 individuals representing 133 Neotropical Myrtaceae species. Phylogenetic least square models and machine-learning regressions were used to test H1 and H2. ResultsLeaves and flowers generally shrank after herborization, fruits size metrics tended to increase, and seeds were largely unaffected. Water content was significantly associated with the magnitude of herborization effects in flowers and some leaf and seed traits. Fresh trait values were accurately predicted from herbarium measurements. Prediction errors were lowest for leaf traits, followed by fruits, flowers, and seeds. DiscussionThese results partially support H1 and support H2, indicating that herbarium specimens can be reliably used for trait analyses when organ-specific responses are considered, providing a practical framework to account for potential desiccation bias in functional trait research.
Konrai, K.; Ito, R.; Sunayama, S.; Omura, K.; Isagi, Y.; Kitajima, K.; Onoda, Y.
Show abstract
PremiseElliptic Fourier analysis is widely used to quantify leaf shape variation, but inconsistent normalization and orientation alignment can introduce biologically irrelevant variation. In addition, a reproducible workflow from raw images to normalized elliptic Fourier descriptors (EFDs) is still lacking. Methods and ResultsWe developed LeafContourEFD, a GUI application for reproducible leaf morphometrics. It integrates image segmentation, contour extraction, EFD calculation, and an extended normalization framework, termed oriented true EFD normalization, based on a user-defined biological reference axis. Analyses of Quercus serrata, Q. crispula, and Triadica sebifera showed that existing normalization methods can introduce orientation-related variance when the first-harmonic major axis does not match the leaf base-to-tip axis. In contrast, oriented true normalization removed these artifacts, yielding clearer shape transitions along principal components allowing shape variation among leaves to be captured while preserving biologically meaningful lateral asymmetry. ConclusionsLeafContourEFD improves interpretability and reproducibility in outline-based morphometrics and provides transparent outputs and metadata for data sharing and cross-study comparisons.
Pelosi, J.; Yanez, A.; Veldhuisen, L. N.; Dant, A.; Northing, P. C.; Bland, R. G. W.; Testo, W. L.; Dlugosch, K. M.
Show abstract
Background and AimsNon-native species are now ubiquitous members of regional floras. The factors that lead to establishment and dominance of non-native species are continuously debated. Fundamental hypotheses about drivers of invasion success include the role of phylogeny, polyploidy, genome size, and rapid niche evolution. These hypotheses have been tested in the seed plants, but ferns, the second largest group of vascular plants, have rarely been considered in these analyses, despite making up a non-trivial portion of non-native floras. MethodsWe compiled a dataset of global non-native ferns and categorized them along the invasion spectrum using descriptions from the literature and natural history collections. Using this dataset, we assessed I) the taxonomic diversity and phylogenetic clustering of non-native ferns, II) the geographic distribution of fern introductions, testing for shifts in climatic niches, and III) test for the association of invader traits across the invasion continuum, including smaller genome sizes and higher ploidal levels. Key ResultsWe generated a dataset that includes 83 taxa; of these, we classified 18 as casual, 35 as naturalized (but not invasive), and 30 as invasive. Using this dataset, we found I) weak or no phylogenetic clustering of non-native ferns, II) some regions are overrepresented as sources and recipients of introductions, III) climatic niches are often conserved between native and introduced ranges, but can differ between introductions, IV) naturalized ferns have smaller genomes, and V) invaders have higher ploidal levels. ConclusionsWe integrated regional floras, occurrence and climate data, phylogeny, and cytology to test fundamental hypotheses regarding the colonization success of ferns. This study provides insights into the ecological, genomic, and phylogenetic features associated with the colonization of new habitats by non-native ferns, a largely overlooked portion of non-native plant taxa.
Kowal, J.; Upham, R.; Kiani, A.; Rickards, M.; Serpell, E.; Bidartondo, M. I.; Evangelisti, E.; Schornack, S.; Sibbit, J.; Treder, K.; Weidinger, S.; Suz, L. M.
Show abstract
O_LIRoot colonisation by endomycorrhizal fungi can indicate habitat condition. However, due to the significant time required to assess colonisation using traditional microscope techniques, studies of colonisation at large scales are impractical. AI-powered approaches may increase output and facilitate ecosystem assessments. C_LIO_LIWe trained our AI-powered tool MycorrhizaFinder (MFKew) on field roots from diverse ecosystems. It was trained to recognise a range of arbuscular and ericoid mycorrhizal fungal structures, and to differentiate dark septate endophytes common in field-sourced roots. C_LIO_LIHere we describe the semi-automated workflow from root processing and microscope slide scanning to model training and performance evaluation, proposing Macro F1 as the appropriate metric to be optimised. Without human supervision, Macro F1 currently stands at 66% for arbuscular and at 57% for ericoid mycorrhizal colonisation assessment. C_LIO_LIMFKew is user friendly, requires no programming skills and offers flexibility for advanced users who wish to further train the tool using their own labelled mycorrhizal root datasets, including images acquired from different devices or staining protocols. This adaptability allows users to customize the model for specific needs, making it optimal for ecologists and agronomists. Additionally, MFKew supports large-scale, repeatable, medium-throughput monitoring across ecosystems, enabling the assessment of mycorrhizal status and tracking changes over time. C_LI
Escobar, K.; Stiller, J.; Cardenas, P. D.
Show abstract
Background and AimsComplex genomic histories driven by hybridization and polyploidy can shape key plant traits such as defense, stress tolerance, and toxicity, particularly in Amaran-thaceae, which includes crops such as quinoa and spinach. Within this family, white goosefoot (Chenopodium album) is both a widespread agricultural weed and a traditional food resource. However, its evolutionary history is complicated by discordant signals among genomic markers within the C. album complex, comprising diploid, tetraploid, and hexaploid taxa. Here, we tested whether reticulate evolution underlies this genome-wide discordance. MethodsUsing genome-scale phylogenomic data, we analysed 2,298 conserved nuclear loci (BUSCO genes) across 27 Amaranthaceae genomes. Both single- and multicopy gene families were included to capture signals of gene duplication, incomplete lineage sorting, and hybridization. Complementary phylogenomic approaches were used to evaluate whether the evolutionary history is best supported by strictly bifurcating relationships or by reticulate evolution. Key ResultsA consistent C. album lineage was recovered, comprising tetraploid and hexaploid C. album cytotypes together with C. suecicum, C. strictum, C. formosanum, C. acuminatum, and C. opulifolium. Phylogenetic discordance was concentrated within Chenopodium, particularly around the C. album and C. quinoa lineages. Models incorporating hybridization fit better than strictly bifurcating relationships, supporting at least two reticulation events. Hybridization signals were detected in 271 loci in tetraploid and 270 in hexaploid C. album, of which 232 were shared, indicating a shared hybrid origin rather than independent lineages. ConclusionsThe evolutionary history of the C. album lineage is best explained by reticulate processes involving hybridization and polyploidy. Conserved nuclear loci retain persistent signatures of these events, helping to resolve complex evolutionary histories in polyploid plant systems.
Hipp, A. L.; Althaus, K. N.; Fuller, E. L.; Hahn, M.; Larson, D. A.; Mohn, R. A.; Wang, B.; Manos, P. S.
Show abstract
Forest trees pose numerous potential challenges to phylogenomic inference. Their large effective population sizes and relatively long generation times lead to deep allele coalescence and consequently incomplete lineage sorting (ILS), which biases inferences of divergence times toward older ages and introduces gene tree discordance. Deep phylogenetic divergences, reaching back into the Paleocene, introduce reference-mapping biases. Introgression--the movement of genes between lineages--may result in different phylogenies being inferred depending on which individuals are included in analysis, even if the plurality of the genome favors the divergence history unaffected by introgression. These factors influence phylogenetic inference across the Tree of Life but are particularly prevalent in forest trees. Oaks (Quercus) are notable for all three influences. In addition, our knowledge of the oak phylogeny is currently based strongly on restriction site associated DNA sequencing (RADseq) datasets published over the past decade, which may introduce additional sources of uncertainty. In this chapter, we analyze a 322-species RADseq dataset and genome resequencing data from across the genus to address sources of uncertainty in our understanding of the global oak phylogeny, which we hope will serve as a model for other research groups working on comparable woody plant groups.
Ramos, R. J.; Afkhami, M. E.; Aguilar-Trigueros, C. A.; Barbour, K. M.; Chaverri, P.; Cuprewich, S. A.; Egan, C. P.; Lynn, K. M. T.; Peay, K. G.; Norros, V.; Romero-Olivares, A. L.; Ward, L.; Chaudhary, B.
Show abstract
This paper presents a novel workflow leveraging Large Language Models (LLMs) to rapidly extract trait data from fungal species descriptions, addressing a significant bottleneck in ecological research. We developed and evaluated an LLM pipeline to extract morphological trait data from arbuscular mycorrhizal fungi, comparing performance against a manually curated dataset (TraitAM). Results demonstrate the potential of LLMs for automated trait data acquisition, though accuracy varies by trait and model, with systematic biases observed. This framework offers a blueprint for building trait databases across diverse taxa and domains, significantly accelerating ecological research and conservation efforts.
Tan, D.
Show abstract
Accurate quantification of leaf lesion severity is essential for plant disease research and phenotyping but is often limited by subjective visual scoring and time-intensive manual image analysis. We present LIME, a fully automated, open-source image analysis pipeline for high-throughput quantification of leaf lesions from disease assay images. LIME integrates zero-shot leaf segmentation using the Segment Anything Model with a convolutional neural network for lesion area estimation. Applied to Arabidopsis thaliana leaves infected with Sclerotinia sclerotiorum, the proposed approach achieved a mean absolute percentage error of 12.9%, comparable to observed intrarater variability in manual scoring. Stratified evaluation across lesion-size groups demonstrated consistent prediction accuracy for small, intermediate, and large lesions, and comparative analysis showed that the deep learning-based model substantially outperformed color-based baseline methods. Under GPU-accelerated execution, LIME processed complete assays containing approximately 200 leaves in 15 minutes, representing an approximate 13-fold reduction in processing time relative to manual annotation. Together, these results indicate that LIME enables objective, reproducible, and scalable quantification of leaf lesion severity in standardized plant pathology assays. The pipeline is released as an open-source tool to support quantitative phenotyping studies.
Reinales, S.; Forest, F.; Zuntini, A.; Cardoso, D.; Ballen, G. A.; Cardenas, D.; Pirani, J. R.
Show abstract
Obtaining large and well-resolved phylogenetic trees for neotropical clades is challenging, as many species inhabit remote regions, and sampling often relies on herbarium specimens with highly degraded DNA. Target capture provides an effective solution for retrieving molecular data from fragmentary material. However, data processing using tools generally designed for diploid organisms and single-copy loci is also challenging, particularly when events such as genome duplication and hybridisation have shaped the lineage evolution. We used dual-hybridisation to integrate Ochnaceae-specific and universal probes to reconstruct the phylogenetic relationships of Sauvagesieae, a pantropical clade with ca. 90 species mainly distributed in the northern Andes, the Brazilian Espinhaco Range, and the Amazon-Guyana region. We tested different filtering strategies involving missing data and paralogs to assess probable sources of tree discordance and topological uncertainty. We found no significant benefit in reducing tree discordance after removing entire genes due to the presence of paralogs or a high amount of missing data. Removing fragmentary sequences instead improved alignments and increased branch support of gene trees. By quantifying the proportion of SNPs, analysing the distribution of the allele frequencies, and gene-tree quartet frequencies, we found evidence of polyploidisation and hybridisation, which could reduce resolution at internal nodes, particularly in mountain clades. Our results underscore the importance of exploring the complexities of target-capture data, not only to improve phylogenetic resolution but also to understand the sources of phylogenetic conflict and the underlying molecular evolutionary processes.
Dant, A.; Pelosi, J.; Northing, P. C.; Dlugosch, K. M.
Show abstract
PremiseCentaurea melitensis (Asteraceae) is a problematic invader of grasslands globally, but little is known about its genetic makeup. Here we develop a reference genome to facilitate studies of its invasion history, genetic variation, and evolution. MethodsInbred offspring of a single individual of C. melitensis from its invasion of California, USA were used for flow cytometry to estimate genome size, and for genomic DNA extraction. DNA was sequenced with PacBio HiFi technology (yield = 85.7Gb). The genome was assembled with Hifiasm and annotated with BRAKER3. GENESPACE was used to compare gene order (synteny) with three other species within the subfamily Cichorioideae. ResultsWe estimated a mean genome size of 795.0 Mbp for C. melitensis, and our assembly totaled 696.6 Mbp in 48 contigs (N50 = 55.6 Mbp; BUSCO = 98%), with annotation of 25,157 protein-encoding genes. This included four telomere-to-telomere putative chromosomes, nine additional chromosome arms terminated by telomeric repeats, and a complete chloroplast genome. Synteny varied markedly across the genus and subfamily, suggesting a dynamic history of structural variation in the lineage of C. melitensis. DiscussionWe provide a highly complete and contiguous genome assembly to facilitate the further study of genomic variation in C. melitensis.
Gaar, S.; Müller, C.; Dussarrat, T.
Show abstract
O_LIHerbivory is a major biotic stress for plants, triggering the induction and modulation of diverse specialized metabolites. Such induction responses are well studied for leaves and have been shown to depend on the herbivore feeding mode. Little is known about changes in flower metabolites and chemodiversity due to florivory type. Moreover, we lack an understanding of the intraspecific variation in such responses and whether these are spatially structured. C_LIO_LIThe aromatic plant Tanacetum vulgare, which shows high intraspecific chemodiversity in terpene profiles, was used to examine chemotype-specific metabolic responses of flower heads to infestation by the inflorescence-infesting aphid Macrosiphoniella tanacetaria or the flower-feeding beetle Olibrus spp. under field conditions. At peak flowering, each plant received both florivory treatments on separate stems, leaving one stem herbivore-free as a control. After four days, flower heads were harvested to analyze terpenes (GC-MS) and metabolic fingerprints (LC-MS). C_LIO_LIWe found stem-specific floral metabolic responses, with florivory altering specific chemical families and their chemodiversity. Levels of a few terpenes decreased following infestation, while none increased. Untargeted analyses revealed that aphid infestation had a lower effect on flower chemistry than beetle infestation, with aphid infestation mainly causing decreases and beetle infestation predominantly leading to increases in some metabolite intensities, but little overlap across treatments and chemotypes. C_LIO_LIOur results demonstrate that floral metabolic responses to florivory are spatially structured, florivore type-specific and shaped by plant chemotype. These findings highlight that the interplay between vascular organization, insect feeding mode, and intraspecific chemodiversity governs how flowers adjust their chemical defenses. C_LI One-sentence summaryTanacetum vulgare showed chemotype-specific responses to florivory by aphids (Macrosiphoniella tanacetaria) and beetles (Olibrus spp.), with aphids causing decreased and beetles increased levels of metabolic features within the same plant individuals, with little overlap in significant features across chemotypes.
Herrighty, E. M.; Specht, C. D.; Gore, M. A.; Solano, L.; Estrada-Gamboa, J.; Hernandez, C. E.; Tufan, H. A.; Landis, J. B.
Show abstract
Understanding crop genetic diversity is essential for conservation and breeding, yet farmer-maintained germplasm remains largely underrepresented in genomic studies. Theobroma cacao L. has a complex domestication history and extensive global diversity, and cacao currently cultivated in Central America, particularly in Costa Rica, has been understudied compared to South American and Mexican cultivars despite cultural and historical importance. In this study, we investigate the genetic diversity of cacao from farmer-managed systems across Costa Rica to search for Criollo germplasm and identify and characterize any unique local genetic groups. Ninety-four trees were sampled from 17 farms across four regions of the country and sequenced using whole genome resequencing. Farmer materials were analyzed alongside 166 previously characterized reference accessions representing major cacao genetic groups. Population structure analyses, phylogenetic reconstruction, and network approaches revealed that Costa Rican cacao encompasses multiple known genetic groups, including Criollo-derived lineages, while also harboring locally distinct diversity not fully represented in current global reference collections. Analyses revealed close kinship between many accessions with no clear geographic patterns corresponding to the observed population differentiation, reflecting the effects of farmers in creating dominant patterns of gene flow through seed-saving, clonal propagation, and sharing genotypes among farms. Heterozygosity levels varied substantially among individuals, consistent with a mixture of highly inbred Criollo trees and more heterozygous, admixed genotypes. We find that farmer-managed cacao systems are reservoirs of genetic diversity, including possibly rare or historically important lineages, underscoring the value of these farming systems for effective conservation and management of genomic resources for cacao resilience and improvement.
Peng, S.; Inouye, B. D.; Ramirez-Parada, T.; Mazer, S. J.; Record, S.; Ellison, A. M.; Davis, C. C.
Show abstract
Long-term field observations typically are the "gold-standard" for inferences of phenological sensitivities in montane systems but are spatially limited. Herbarium specimens provide broader spatial coverage, but their utility to accurately capture montane phenology remains poorly known. We compared flowering phenology of 45 species inferred from herbarium specimens with comparable data from nearly 50 years of direct observations at the Rocky Mountain Biological Laboratory. Estimates of flowering time and phenological sensitivity to snow density were consistent between herbarium specimens and observations, but observations revealed secondary flowering peaks. Herbarium specimens additionally yielded shallower estimates of phenological sensitivity to spring temperature than did field observations. Across co-occurring species, "early" flowering individuals inferred from herbarium specimens, rather than the mean response across all individuals, may better approximate community-level phenological responses to temperature changes. We conclude that herbarium specimens are reliable resources for closing gaps in understanding phenological variation along elevational gradients of montane systems.
Leone, M.; Rech De Laval, V.; Drage, H. B.; Waterhouse, R. M.; Robinson-Rechavi, M.
Show abstract
Integrating taxonomic data from various sources presents a significant challenge in the study of biodiversity research, due to non-standardized nomenclature and evolving species classifications. Discrepancies between major repositories like the Global Biodiversity Information Facility (GBIF) and the National Center for Biotechnology Information (NCBI), as well as citizen science platforms such as iNaturalist, lead to fragmented and sometimes inaccurate biological data. We present TaxonMatch, a tool designed to address these challenges. TaxonMatch aligns taxonomic names, resolves synonymy, and corrects typographical and structural inconsistencies across databases. We show how it can be used to build a common backbone arthropod taxonomy over NCBI, GBIF and iNaturalist, to find the closest molecular data to a given fossil, and to identify IUCN endangered species with molecular data. TaxonMatch provides a cohesive taxonomic framework and a consistent taxonomic backbone, and can be applied to any taxonomic source. The tool is available at https://github.com/MoultDB/TaxonMatch.
Villa-Machio, I.; Masa-Iranzo, I.; Nürk, N. M.; Pokorny, L.; Meseguer, A. S.
Show abstract
The combination of target capture sequencing (TCS) with low-coverage whole genome sequencing (lcWGS), an approach known as Hyb-Seq, has allowed the integration of natural history collections into the genomics revolution, transforming biodiversity research. To implement Hyb-Seq, a collection of genomic targets, often nuclear orthologs, is needed to design probes for TCS. In flowering plants, the universal Angiosperms353 probe set has been proven resolutive at multiple evolutionary scales, with caveats. Malpighiales is known to be one of the most challenging flowering plant orders to resolve. Within this order, the clusioid clade ([~]2.2K species, 94 genera, five families) is no exception. To resolve phylogenetic relationships in this recalcitrant clade, we design a custom probe set, the Clusioids626 kit, composed of 39,936 120-mer probes targeting 626 nuclear orthologs ([~]6.6M nucleotides). This probe set includes all Angiosperms353 targets and 273 clusioid-specific ones, carefully chosen taking copy-number, length evenness, and phylo-informativeness into account. We test our probe set on 70 accessions representing all families and tribes in the clusioid clade. On average, 50.4% of TCS reads mapped to our targets, recovering a median of [~]600 orthologs. Relationships for all clusioid families are fully resolved for our nuclear targets. Additionally, 105 plastid coding DNA sequences were retrieved from the lcWGS fraction. A strong cyto-nuclear conflict was detected. The Clusioids626 kit performs better than the universal Angiosperms353 enrichment panel alone. Our kit design workflow can be extended into other lineages for which a universal probe set exists but more resolution is needed.
Prouvost, A.; Connesson, L.; Le Gourrierec, T.; Freville, H.; David, J.; Plessis, C.; Magnier, B.
Show abstract
Accurate and reproducible assessment of foliar disease severity is essential for evaluating the performance of heterogeneous plant communities and understanding host-pathogen interactions. However, traditional visual scoring methods remain subjective, with limited precision, and difficult to scale in large phenotyping experiments. Here, we present a semi-automated image analysis workflow designed to quantify multiple foliar disease symptoms simultaneously on wheat flag leaves sampled from varietal mixtures. The workflow combines three methodological components: (i) a standardized protocol for leaf sampling and imaging, (ii) supervised machine learning segmentation using Random Forest implemented in Ilastik to classify multiple symptoms (powdery mildew and yellow rust), and (iii) a graphical user interface facilitating pipeline deployment by non-specialist operators. To evaluate the influence of image representation on classification performance, four color spaces (RGB, HSV, HLS, LAB) were systematically compared. The approach was validated using images of durum wheat flag leaves collected from a field experiment assessing eight-way varietal mixtures under natural fungal pressure. Cross-validation against manually annotated images demonstrated high segmentation accuracy across all symptom. Comparison among color spaces revealed only minor differences in performance. Overall, this workflow offers a cost-effective, annotation-efficient and reproducible alternative to deep learning approaches, leveraging open-source and actively maintained tools while requiring limited training data and enabling objective, reproducible and scalable disease phenotyping.
Kesälahti, R.; Cervantes, S.; Niskanen, A.; Pyhäjärvi, T.
Show abstract
Genomic imprinting is a rare epigenetic phenomenon in plants and animals, defined by parent-of-origin specific gene expression. Its molecular mechanisms and evolutionary significance remain incompletely understood. In this study, we investigated whether genomic imprinting occurs in Scots pine and, by extension, in other conifers to gain insight into the evolutionary origins of imprinting. We performed reciprocal crosses to assess imprinting in seed embryos and applied a unique approach that used exome-capture data from the haploid, maternally inherited megagametophyte tissue to identify maternal alleles, thereby allowing us to infer paternal alleles in the embryos of the same seeds. Our findings show that maternally inherited haploid megagametophyte tissue offers an effective strategy for resolving parental alleles in offspring while simultaneously removing extensive paralogous variation from the dataset. This framework is broadly applicable to other conifer species and to taxa that possess comparable maternally derived haploid tissues. No evidence of genomic imprinting was detected. Although the limited overlap between the exome-capture and RNA-sequencing datasets and the stringent paralog filtering reduced the amount of analyzable data considerably, the absence of detectable imprinting may also reflect genuinely weak or absent imprinting signals in conifers. We identified several limitations in this preliminary study and outline recommendations for future work to overcome them, and additional research will be necessary to determine whether genomic imprinting occurs in conifers